Learning goals of the first week

  1. What are R, RStudio, and Git?
  2. How to use R as a basic calculator?
  3. What are Objects in R?
  4. Something about R Data structures
    • Vectors
    • Matrices
    • Data frames
    • Arrays
    • Lists
  5. Selecting Elements in a vector, matrix or data frame
  6. Working with data frames
    • Loading data
    • Manipulating data
  7. Calculating Measures of Central Tendency and Variability
  8. Plotting data and saving plots

What are R, RStudio, and Git?

What is R?

R is a programming language designed to help you perform statistical analysis, create graphics, and later on write your own statistical software. R is becoming increasingly popular and knowledge of R will help you on the job market. R is probably the most versatile statistical tool out there (and it’s free and open-source so you can literally use it anywhere). It is for example used in all fields of academia, from biology to economics, and outside academia including

  • Wallstreet
  • The Economist
  • BBC
  • Google Analytics
  • NY Times graphics department

What is RStudio?

RStudio is a great graphical user interface for R. In recent years, a growing number of features have been added to this graphical user interface, which makes it the preferred choice for learning R, especially among beginners. You can think about it as R being the engine of the car and RStudio being the dashboard.

What are RStudio Projects?

RStudio projects make it straightforward to divide your work into multiple contexts, each with its own working directory, workspace, history, and source documents. A project is basically a folder on your computer that holds all the files relevant to a particular piece of work. Working in RStudio Projects has multiple advantages:

  • Once an RStudio Project is set up, you do not have to worry about your working directory anymore.
  • When opening an RStudio Project, a new R session (process) is started. This makes sure that things you do in different projects do not mess up.
  • You can open and work with multiple RStudio projects at the same time.
  • RStudio projects can easily be exported to and imported from GitHub.

What are Git and GitHub?

Git is a version control system that makes it easy to track changes and work on code collaboratively. GitHub is a hosting service for git. You can think of it as a public Dropbox for code but on steroids. With version control, you will build your projects step-by-step, be able to come back to any version of the project, and accompany everything with human-readable messages.

As a student, you even get unlimited private repositories which you can use if you don’t feel like sharing your code with the rest of the world (yet). We will use private repositories to distribute code and assignments to you. And you will use it to keep track of your code and collaborate in teams.

With git, writing code for a project will look somewhat like this:

What is a Git Repository?

A Git repository is a space where you store and manage a project. It contains all of your project’s files and stores each file’s revision history. It’s common to refer to a repository as a repo.

We will you one repository for each lab and one repository for each homework assignment. You can directly import (“pull”) repositories via RStudio and save them on your computer. If you changed something in your project, you can easily upload (“push”) the new version to GitHub. GitHub will keep track of all changes you made over time within your project.

Workflow overview

Our workflow will appear a bit tricky at the beginning but we are sure that you will handle it with ease very soon. We assume that by now you downloaded and installed R and Rstudio and have your personal GitHub account.

The course has its own page on GitHub, you can find it here: https://github.com/uni-mannheim-qm-2024. This is the place where you can find all relevant material for the lab sessions. It is also the place where you download and hand in your homework assignments.

So how does this work?

Get the URL of the repo for the current week

Go to https://github.com/uni-mannheim-qm-2024 and click on the repository for the current week (this week, this is called week01_introduction). Now, click on the green Clone or download button and select Use HTTPS (this might already be selected by default, and if it is, you’ll see the text Clone with HTTPS as in the image below). Click on the clipboard icon to copy the repo URL.

Import the repository in RStudio

  1. Open RStudio.
  2. Click on File on the top bar and select New Project....

  1. Select Version Control.
  2. In the next window, select Git.
  3. In the final window, paste the repo URL you grabbed from GitHub in the Repository URL window. Click on Browse to select the folder on your computer where you want to store the project.
  4. Click on Create Project.

Get working

  1. Open the .Rmd file that is stored in the project (in week 1, this is called QM2024_Week01.Rmd).

The RStudio interface

The RStudio interface has four panes:

  • Editor: This is were you usually code. You can either use .Rmd (R Markdown) or .R (plain R code) files.
  • Console: This is where the results appear once you execute your R-code. You can also directly type R-code into the console and execute it. However, we cannot save this code which is why we usually work in the Editor.
  • Environment: Here you have an overview over all the objects currently loaded in your environment. You will learn more about objects later in the course.
  • Files, Plots, Packages, Help, Viewer: Plots and other things will appear here, don’t worry too much about it for the moment.

Some remarks on ChatGPT (and other large language models)

You have probably all tried out ChatGPT, and yes, it is impressive! Are you allowed to use ChatGPT and its alternatives for your assignments in this course? And if yes, do we encourage it? Here is our view on these questions:

  1. Yes, you are allowed to use ChatGPT. In fact, we neither see a way to fully prevent you from using it, nor do we think that trying to do so would be reasonable.

  2. In our assessment, large language models such as ChatGPT can be extremely helpful to those who know what they are doing, but they are not helpful at all if you have no idea what you are doing. That means: you yourself need a good understanding of what you want to do (this mostly refers to the lecture material), and a good understanding of how R works (this refers mostly to what you learn in this course) to productively work with ChatGPT. It will only be helpful to you if (1) you write precise prompts and (2) you are able to critically evaluate ChatGPT’s responses (and spot the errors it makes). This is only possible if you learn quantitative methods and R by yourself first. Because of that, our advice is the following:

  • For coding: Feel free to try out whether ChatGPT is able to solve specific problems that you encounter. But never let ChatGPT replace your own critical thinking. You always want to understand the code you use, no matter whether it comes from our material or from ChatGPT.
  • For answering specific questions (e.g. interpretation of a regression result): It may be interesting to check how ChatGPT would answer a particular question, but you really should think about the question and write the answer yourself. If you want to ask ChatGPT, make it after you came up with your own answer.
  • We suppose that all documents with your name on are written by yourself and a product of your own critical thinking process. It is a matter of academic integrity that this is actually the case.

R as a basic calculator

Enough preparation, let’s finally dive into R!

R can perform basic math operations. Here are some examples:

1 + 1
## [1] 2

Some more calculations:

2 - 3
## [1] -1
4 * 5
## [1] 20
2^2
## [1] 4
4 / 2
## [1] 2
2^(1 / 2)
## [1] 1.414214

If you place parentheses correctly, R incorporates the order of operations.

((2 + 2) * 2)^2
## [1] 64

This should give the same result as before.

(4 * 2)^2
## [1] 64

But this of course gives a different result:

(2 + 2 * 2)^2
## [1] 36

You can also use other math functions you know from your calculator:

this is \(\sqrt{2}\)

sqrt(2)
## [1] 1.414214

when you do not specify the base, R uses the natural log with base \(e\), i.e. \(\log_e(10)\)

log(10)
## [1] 2.302585

but R can also use a different (virtually any) base, e.g. \(\log_{10}(10)\)

log(10, base = 10)
## [1] 1

or with base = 5, i.e. \(\log_5(10)\)

log(10, 5)
## [1] 1.430677

Pro tip: Always close your parentheses!

  • If you encounter an error when running your code, it is often a missing parenthesis, brace or bracket.
  • RStudio highlights your paired parenthesis. This is really nice and helpful!

Make use of comments

It is hard to understand pure code, especially for someone who did not write it (and future-you will also have a hard time to understand it).

Pro tip: Add comments to your code, describing what you are doing and why you are doing it.

With comments:

  • Other people can understand your code (for example us when we are going through your Homework assignments or your classmates when you are talking about your work).
  • You can remember what you were doing when you reopen your code after weeks (e.g., to prepare the data essay at the end of the semester).

So how can I add comments?

  • Begin a line with a # symbol,
  • Everything on that line after the # will be commented out.
  • This means if you send the script to the R console the console will not run these lines.
# this is a comment

1 + 1 # This line runs
## [1] 2
# 1 + 1 This line does not run
  • Indent your scripts (both code and comments) using spaces so that they are readable.
  • Try to code according to Google’s R Style Guide.

Good coding style is like using correct punctuation.
Youcanmanagewithoutitbutitsuremakesthingseasiertoread.. – Hadley Wickham

Objects in R

But I already do have a calculator. Why do I need R?

R is so much more! R is an object-oriented programming language.

What is an “object” in R?

  • An object is a form to store the data you want to use.

What kind of data can I store as an object?

  • In R there are three main types of data you can store:
    • Numeric (numbers)
    • Character (letters/words/sentences/texts, called strings)
    • Logical or Boolean (TRUE/FALSE statements)
  • These are the types you will often encounter.
  • However, there are many other possible types.

So how can I get information into an object?

  • Store an object by using <- as assignment operator

Examples:

lucky_number <- 7
 
# Now we created an (numeric) object called "lucky_number"

lucky_number
## [1] 7

The class() command lets us check the type of an object:

lucky_number <-

class(lucky_number)

Let’s see how this works live, this time with a character object:

firstname <- "" # This is a character object

firstname
## [1] ""
class(firstname)
## [1] "character"
lastname <- ""

lastname
## [1] ""

Exercise I: Creating objects

Your turn: Here is your very first exercise!

Pro tip: Copy the lines of code that worked for something similar. Then, adjust the code according to your problem. That’s how coding works most of the time!

Create three objects:

1. `my_lucky_number` should contain your lucky number.
2. `my_firstname` should contain your firstname.
3. `my_lastname` should contain your lastname.

After you created the objects, call them separately. Don’t forget to add comments to your code.

Data Structures

What kind of data can I store in R? Different types of objects that can contain different types and sets of data:

  • Scalar: numbers, characters, logical values
  • Vector: sets of scalars (thus, numbers, characters, logical)
  • Matrix: two-dimensional set of scalars of same type
  • Data frame: collections of vectors of (possibly) different types
  • Array: multidimensional set of scalars of same type
  • List: combinations of scalars, vectors, matrices…

We will go through all of these object types below. On top of that we will also learn how to calculate the measures of central tendency and variability with vectors.

Data Structures - Vectors

Let’s start with vectors. We want a vector of the numbers 1, 2, 3, 4 and 5. How do I assign this set of numbers to a vector?

The c() function combines single values to a vector:

example_vec <- c(1, 2, 3, 4, 5)

example_vec
## [1] 1 2 3 4 5

This also works for characters/strings:

country_code <- c("DE", "FR", "NL", "US", "UK")

country_code
## [1] "DE" "FR" "NL" "US" "UK"

And it also works for a combination of numbers and characters:

example_vec2 <- c("Welcome", "to", "the", "lab", "in", "A", 5, "or", "B", 6)

example_vec2
##  [1] "Welcome" "to"      "the"     "lab"     "in"      "A"       "5"      
##  [8] "or"      "B"       "6"

What if we start with numbers?

example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")

example_vec3
## [1] "1"            "2"            "3"            "4"            "5"           
## [6] "R can count!"

Note that if you have a character field in your vector, R will turn ALL values into character data! (You can see that by the quotes around the values)

Let’s check the type of data by using the class() command on example_vec3.

example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")

class(example_vec3)
## [1] "character"

You can use mathematical functions on each element in numeric vectors/matrices etc.

example_vec <- c(1, 2, 3, 4, 5)

sqrt(example_vec) # Take the square root of each element in example_vec
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

What about multiplication?

example_vec <- c(1, 2, 3, 4, 5)

example_vec * 10
## [1] 10 20 30 40 50

There are also some functions that you can use on the whole vector.

example_vec <- c(1, 2, 3, 4, 5)

sum(example_vec) # Question: What does sum() do?
## [1] 15
length(example_vec) # Question: What does length() do?
## [1] 5

Data Structures - Matrices

Matrices in R are two-dimensional table objects. In R, matrices are always row by column. Like roller coaster, Roman Catholic or Ray Charles).

In a matrix, all data must be of the same type. If you mix numeric and character entries, the matrix will be all character just like in a vector.

How do I create a matrix in R?

example_mat1 <- matrix(c(1, 2, 3, 4, 5, 6),
  nrow = 3,
  ncol = 2
)

example_mat1 # How did R fill the numbers in the matrix?
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

You could also change the options an let R fill the matrix by rows (instead of columns):

example_mat2 <- matrix(c(1, 2, 3, 4, 5, 6),
  nrow = 3,
  ncol = 2,
  byrow = T
)

example_mat2 # See the difference?
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6

Or you could create a matrix from different vectors by using column-bind on two or more vectors. It works similar to the c() function but with vectors as input instead of scalars.

Let’s first create two vectors of the same length:

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

# And now column-bind - cbind() - the two vectors.

example_mat3 <- cbind(vec1, vec2)

example_mat3
##      vec1 vec2
## [1,]    1    7
## [2,]    2    8
## [3,]    3    9
## [4,]    4   10
## [5,]    5   11
## [6,]    6   12

Similarly, we can row-bind – rbind() – the two vectors:

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

example_mat4 <- rbind(vec1, vec2)

example_mat4
##      [,1] [,2] [,3] [,4] [,5] [,6]
## vec1    1    2    3    4    5    6
## vec2    7    8    9   10   11   12

Data Structures - Data frames

Data frames are two-dimensional table objects, just like matrices. Most data you will analyze in R will be in this form.

You can create data frames from vectors just like cbind() using data.frame():

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

example_df1 <- data.frame(vec1, vec2)

example_df1
##   vec1 vec2
## 1    1    7
## 2    2    8
## 3    3    9
## 4    4   10
## 5    5   11
## 6    6   12

However, data frames are always column-bound vectors. In a data frame, everything within a column has to be of the same data type. But you can mix data types between columns:

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

vec3 <-
  c(
    "First Row",
    "Second Row",
    "Third Row",
    "Fourth Row",
    "Fifth Row",
    "Sixth Row"
  )

example_df2 <- data.frame(vec1, vec2, vec3)

example_df2
##   vec1 vec2       vec3
## 1    1    7  First Row
## 2    2    8 Second Row
## 3    3    9  Third Row
## 4    4   10 Fourth Row
## 5    5   11  Fifth Row
## 6    6   12  Sixth Row

You can also name your columns/variables. Either when creating your data frame:

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

vec3 <-
  c(
    "First Row",
    "Second Row",
    "Third Row",
    "Fourth Row",
    "Fifth Row",
    "Sixth Row"
  )

example_df3 <- data.frame(
  variable_1to6 = vec1,
  variable_7to12 = vec2,
  variable_rows = vec3
)

example_df3
##   variable_1to6 variable_7to12 variable_rows
## 1             1              7     First Row
## 2             2              8    Second Row
## 3             3              9     Third Row
## 4             4             10    Fourth Row
## 5             5             11     Fifth Row
## 6             6             12     Sixth Row

Or by renaming an existing data frame.

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

vec3 <-
  c(
    "First Row",
    "Second Row",
    "Third Row",
    "Fourth Row",
    "Fifth Row",
    "Sixth Row"
  )

example_df3 <- data.frame(vec1, vec2, vec3)


# Rename the variables of an existing data frame

names(example_df3) <- c("variable.1", "variable.2", "variable.3")

example_df3
##   variable.1 variable.2 variable.3
## 1          1          7  First Row
## 2          2          8 Second Row
## 3          3          9  Third Row
## 4          4         10 Fourth Row
## 5          5         11  Fifth Row
## 6          6         12  Sixth Row

We can also add a new variable to an existing data frame. We simply create a data frame which consists of a data frame and a vector:

example_df4 <-
  data.frame(example_df3, 
             variable_4 = c(90, 91, 92, 93, 94, 95))

example_df4
##   variable.1 variable.2 variable.3 variable_4
## 1          1          7  First Row         90
## 2          2          8 Second Row         91
## 3          3          9  Third Row         92
## 4          4         10 Fourth Row         93
## 5          5         11  Fifth Row         94
## 6          6         12  Sixth Row         95

Data Structures - Arrays

These are like matrices, except that they are typically three-dimensional. You’re not going to see many of these, but we’ll introduce them for completeness. Here is an illustration of what a three-dimensional array could look like:

You can think of 10 3 x 5 bingo cards, that all display spaces 1 through 15 for example, as an array. If I were to display that in R, I would use the array function to write:

bingo_array <- array(seq(1, 15, 1), 
                     dim = c(3, 5, 10))

bingo_array

The general syntax for this function is array(values you want to array, dim = (row, column, height)).

Data Structures - Lists

List objects can contain a series of the other objects we just learned about. A single list can contain a value, a vector, a matrix, AND a dataframe - or many of each!

How do I make a list?

Use the list() function!

# create a vector
example_vec <- c(1, 2, 3, 4, 5, 6, 7, 8)

# create a matrix
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
                      nrow = 3,
                      ncol = 2)

# create an array
example_array <- array(seq(1, 15, 1), dim = c(3, 5, 10))

example_vec3 <- c(1, 2, 3, 4)


## Store all objects in a list

example_list <- list(example_vec, example_mat, example_array)

example_list

Selecting elements in a vector, matrix or data.frame

Sometimes we want to select single or multiple data entries from our objects. We can do this by selecting elements via [].

Let’s first do it with a vector. Remember our country_code vector?

country_code <- c("DE", "FR", "NL", "US", "UK")

country_code
## [1] "DE" "FR" "NL" "US" "UK"

Let’s say we only want to select the “US”. We can achieve this by accessing the value via its position in the vector:

country_code[4]
## [1] "US"

Now we want to select all values but the “US”:

country_code[-4]
## [1] "DE" "FR" "NL" "UK"

You can pass multiple indexes as a vector:

country_code[c(1, 2, 3)]
## [1] "DE" "FR" "NL"

1:3 generates the vector c(1, 2, 3) a bit quicker.

country_code[1:3]
## [1] "DE" "FR" "NL"

If we want the values “DE”, “FR”, and “US”, a sequence does not really help. But we can put a vector with a combination of a sequence and some other values in the square brackets:

country_code[c(1:2, 4)]
## [1] "DE" "FR" "US"

Selecting items in two-dimensional objects

We can access values of a matrix similarly. However, we need to think of one additional dimension.

example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
                      nrow = 3,
                      ncol = 2)

example_mat
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

Generally, we type object[row, column] to access specific rows and columns. To see how this works, let’s have a look at our example_mat:

Now we want to access the value 6. It’s in the third row and the second column.

example_mat[3, 2]
## [1] 6

We could also select an entire column (and use it like a vector).

example_mat[, 2]
## [1] 4 5 6

By accessing values with the [] square brackets, we could also replace values. Let’s say we want to recode the entire first column in example_mat3 to 99:

example_mat[, 1] <- 99

example_mat
##      [,1] [,2]
## [1,]   99    4
## [2,]   99    5
## [3,]   99    6
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
                      nrow = 3,
                      ncol = 2)

example_mat[, 1] <- 99
# And we want to recode the first and the third value in the second column
# to 91 and 100

example_mat[c(1, 3), 2] <- c(91, 100)

example_mat
##      [,1] [,2]
## [1,]   99   91
## [2,]   99    5
## [3,]   99  100

Selection with conditions

This is a good start to select and recode data in an object. However, it might be a bit exhausting (maybe even impossible) to always look up the exact position in the object.

Luckily, R let’s us also select elements based on conditions. Instead of the position we put a condition in the [] square brackets.

  • For this we can use several conditions:
    • Is equal to: ==
    • Is not: !=
    • Is smaller than: <
    • Is greater than: >
    • Is smaller or equal to: <=
    • Is greater or equal to: >=
  • Conditions can be combined with and and/or or statements
    • AND: &
    • OR: |

So how do conditions work? Let’s create a matrix to work with

vec1 <- c(1, 2, 3, 4, 5, 6)

vec2 <- c(7, 8, 9, 10, 11, 12)

# And now column-bind (cbind()) the two vectors.

example_mat <- cbind(vec1, vec2)

example_mat
##      vec1 vec2
## [1,]    1    7
## [2,]    2    8
## [3,]    3    9
## [4,]    4   10
## [5,]    5   11
## [6,]    6   12
example_mat > 9 # This returns TRUE or FALSE for each value in the object.
##       vec1  vec2
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE  TRUE
## [5,] FALSE  TRUE
## [6,] FALSE  TRUE

Now if we put this condition in square brackets we get the values for which the condition is true.

example_mat[example_mat > 9]
## [1] 10 11 12

Exercise II: Selecting and recoding elements

Here comes the second round of exercises:

  1. Create two vectors vec1 and vec2.

    • vec1 should contain 1, 56, 23, 89, -3 and 5 (in that order).
    • vec2 contains 24, 78, 32, 27, 8 and 1.
  2. Now select elements of vec1 that are greater than 5 or smaller than 0.

  3. Next set vec1 to zero if vec2 is greater than 30 and smaller or equal to 32.

Please solve all three steps in the next code chunk.

Working with data.frames

Working with data frames is similar to working with matrices and vectors.

Loading and manipulating data

Usually (and especially for this class) we want to work with existing datasets. R knows and can load most of the standard formats of datasets, like .csv, .xlsx (Excel), .dta (Stata), .sav (SPSS) and many more.

So far we only used R’s base functions. In order to use some more sophisticated or special R functions, we need to load libraries or packages first. Think of these libraries as extra apps that you can load on your smartphones to extend its functionality.

Right now, we want to load the dataset. In order to use the standard but foreign datasets we need the foreign package.

First, we want to have a look at what the package can do.

packageDescription("foreign")
## Package: foreign
## Priority: recommended
## Version: 0.8-86
## Date: 2023-11-26
## Title: Read Data Stored by 'Minitab', 'S', 'SAS', 'SPSS', 'Stata',
##         'Systat', 'Weka', 'dBase', ...
## Depends: R (>= 4.0.0)
## Imports: methods, utils, stats
## Authors@R: c( person("R Core Team", email = "R-core@R-project.org",
##         role = c("aut", "cph", "cre")), person("Roger", "Bivand", role
##         = c("ctb", "cph")), person(c("Vincent", "J."), "Carey", role =
##         c("ctb", "cph")), person("Saikat", "DebRoy", role = c("ctb",
##         "cph")), person("Stephen", "Eglen", role = c("ctb", "cph")),
##         person("Rajarshi", "Guha", role = c("ctb", "cph")),
##         person("Swetlana", "Herbrandt", role = "ctb"),
##         person("Nicholas", "Lewin-Koh", role = c("ctb", "cph")),
##         person("Mark", "Myatt", role = c("ctb", "cph")),
##         person("Michael", "Nelson", role = "ctb"), person("Ben",
##         "Pfaff", role = "ctb"), person("Brian", "Quistorff", role =
##         "ctb"), person("Frank", "Warmerdam", role = c("ctb", "cph")),
##         person("Stephen", "Weigand", role = c("ctb", "cph")),
##         person("Free Software Foundation, Inc.", role = "cph"))
## Contact: see 'MailingList'
## Copyright: see file COPYRIGHTS
## Description: Reading and writing data stored by some versions of 'Epi
##         Info', 'Minitab', 'S', 'SAS', 'SPSS', 'Stata', 'Systat',
##         'Weka', and for reading and writing some 'dBase' files.
## ByteCompile: yes
## Biarch: yes
## License: GPL (>= 2)
## BugReports: https://bugs.r-project.org
## MailingList: R-help@r-project.org
## URL: https://svn.r-project.org/R-packages/trunk/foreign/
## NeedsCompilation: yes
## Packaged: 2023-11-26 16:54:35 UTC; ripley
## Author: R Core Team [aut, cph, cre], Roger Bivand [ctb, cph], Vincent
##         J. Carey [ctb, cph], Saikat DebRoy [ctb, cph], Stephen Eglen
##         [ctb, cph], Rajarshi Guha [ctb, cph], Swetlana Herbrandt [ctb],
##         Nicholas Lewin-Koh [ctb, cph], Mark Myatt [ctb, cph], Michael
##         Nelson [ctb], Ben Pfaff [ctb], Brian Quistorff [ctb], Frank
##         Warmerdam [ctb, cph], Stephen Weigand [ctb, cph], Free Software
##         Foundation, Inc. [cph]
## Maintainer: R Core Team <R-core@R-project.org>
## Repository: CRAN
## Date/Publication: 2023-11-28 06:42:13 UTC
## Built: R 4.4.1; x86_64-w64-mingw32; 2024-06-14 08:34:00 UTC; windows
## Archs: x64
## 
## -- File: C:/Program Files/R/R-4.4.1/library/foreign/Meta/package.rds
# Ok this seems to be useful. So let's load the package to use it.
library(foreign)

You will often come across datasets which are stored as Stata data files. Those files have the extension .dta.

Right now, we want to load the data set called weather_data_germany_2023.dta which is already stored the raw_data folder in our directory:

weather_data <- read.dta("raw_data/weather_data_germany_2023.dta")

The data contains yearly temperature averages of German cities as well as their geographical location (longitude and latitude). It comes from the “Deutscher Wetterdienst” and you can find it here. Now that we have loaded the data, we can have a look at it.

With head()we can look at the first six rows of the data set:

head(weather_data)
##                         city longitude latitude mean_temp
## 1 Sigmarszell-Zeisertsweiler  9.740446 47.57760     11.14
## 2         Obersulm-Willsbach  9.352493 49.12801     12.28
## 3                   Röllbach  9.253038 49.76440     11.37
## 4     Padenstedt (Pony-Park)  9.925507 54.01884     10.22
## 5            Elzach-Fisnacht  8.108840 48.20121     11.32
## 6           Lippspringe, Bad  8.838795 51.78542     11.12

But we can also look at the entire data set:

weather_data

If we only want to look at the variable names, we can use names():

names(weather_data)
## [1] "city"      "longitude" "latitude"  "mean_temp"

Now we can use our selecting abilities on a data frame. As before we can select elements via their numeric position:

weather_data[1, 2] # first row, second column
## [1] 9.740446
weather_data[1:3, 1] # rows 1-3, first column
## [1] "Sigmarszell-Zeisertsweiler" "Obersulm-Willsbach"        
## [3] "Röllbach"

Additionally, as columns usually have names in data frames, we can use the column names to select values in two ways.

First, we can put the column name in square brackets instead of a column number:

weather_data[1, "city"]
## [1] "Sigmarszell-Zeisertsweiler"
weather_data[, "mean_temp"]

We can also look at two variables at once:

weather_data[, c("city", "mean_temp")]

Second, we can also select an entire column by using the $ operator with the column name: data.frame_name$column_name. Just like this:

weather_data$mean_temp
##   [1] 11.14 12.28 11.37 10.22 11.32 11.12 10.73 11.12  9.07  9.83 10.97 10.35
##  [13] 10.49 10.06 10.48 10.09  7.69 10.89 10.72 10.39 11.70 10.56 12.43 11.26
##  [25] 12.13 10.13  9.90 11.71 10.52  9.95 11.55 10.94  8.83 11.40 10.63 10.55
##  [37] 10.51 11.19  9.90 10.70  9.67 12.31 11.44 10.69 10.69  9.83 11.29 10.35
##  [49] 10.10 11.60  9.85 11.38 10.17  9.51 10.25  9.42 10.03 10.32  8.31 10.29
##  [61]  9.50 11.41  9.73 10.79 10.69  9.40 10.08  7.88 10.26 11.35 12.79 11.12
##  [73] 10.37  9.04  8.61 10.71 10.48 10.15 12.02  7.26 11.72 10.60 11.10 10.01
##  [85] 10.39 10.34 10.52  8.52 11.59  7.12  8.82 10.50 10.16 10.11  9.75 10.22
##  [97] 10.96 12.55 11.27 10.90 11.14 10.87 10.29 10.67 11.14 10.39 11.03  8.85
## [109] 10.78  7.67 10.62 10.37 11.67 10.78 10.70 10.04  8.79 13.14  9.99 10.36
## [121] 11.21 10.66 10.43 12.41 12.09 11.14 12.83 11.66 10.38 10.80 10.26 11.41
## [133] 10.25 10.90 10.90  9.73 11.23 10.58  9.66 10.78  9.89 10.98 10.16 10.43
## [145] 10.88 11.24 10.87 12.24  9.93  9.73 11.37 10.85 10.76 10.23 11.56 12.06
## [157]  8.29 11.23 10.57 12.17 11.04  4.76 10.73 11.79 10.56 10.69 10.53 10.61
## [169] 10.76  7.94 10.61 10.47 11.15 10.49 10.62 11.24 10.64 11.23 12.01  8.71
## [181] 12.45 12.31 10.79 10.14 10.83 10.38 10.74 10.31  9.28 11.03  9.46 10.60
## [193] 10.19 10.56 11.41  8.67 10.92 10.57 10.33 10.75 10.52 10.59 11.64  5.48
## [205] 11.52 10.07 10.56 10.15 11.62 10.98 11.85 10.42 10.05 10.59 10.28 11.32
## [217]  9.71 11.64  9.43 10.10 11.98 11.14 -2.90 10.99 10.09 10.58 11.81 11.15
## [229] 10.01 12.31 10.33 10.35 11.19 11.35  8.57 11.18  9.70 10.11  8.93 11.22
## [241] 12.32 10.30 10.34 10.65 11.31 11.96 11.04 10.22 10.64 10.24 10.03  9.40
## [253] 10.92 11.24 11.08 10.46 11.69 11.16  9.93  9.89 12.94 11.19 10.58 10.30
## [265] 11.08 10.34  9.96 10.49 10.36 10.66 11.05 10.19 11.06 10.47 10.25 10.57
## [277] 10.96 11.05 12.46  9.97 11.38 10.63 11.14 10.23 11.13 11.38 11.83  9.89
## [289] 12.41 11.08 10.90  9.50 10.07 11.67 11.82  8.96 10.38 11.50 10.54 10.72
## [301]  6.88 10.66  9.29 10.58 10.26 12.27 10.23 10.99 10.52 11.10  9.80 11.57
## [313] 10.44 10.82 11.13 10.87 11.18 10.16 10.03  9.46  9.28 10.89 12.83 10.05
## [325] 10.72 12.00 10.59 12.15 10.42 11.68 11.07  7.30 11.32 11.17 10.85  9.84
## [337] 10.39 10.99 10.93  8.73 11.31 11.44  8.45 11.41 10.30 10.38  9.19  9.88
## [349]  9.81 11.69 10.50  9.26 10.39 12.68 10.19 10.85  6.82 10.23 10.38 10.94
## [361] 10.28  9.28 10.81 12.30 10.19 11.35 12.03 10.09 10.97 10.97 11.40 12.67
## [373] 10.02 12.16 10.64  9.65 11.02 10.91 10.49  4.91 10.08 11.19 10.58 10.74
## [385] 11.18 10.89 11.76 11.69 10.61 10.26 10.46 11.52  9.45  9.96 10.70  9.97
## [397] 10.16 11.12 11.32 10.31 12.80 10.56  9.91 12.71 10.28 10.94 12.11 10.57
## [409] 11.82 11.42 10.78 10.36  8.21 10.72 10.40 10.26 10.19 10.45 11.62 11.20
## [421] 10.73 11.20 12.33  5.21 11.32 12.02 10.33 10.73 11.50 10.79 10.83 11.23
## [433] 10.45  8.92 11.07 11.55 10.82 11.19  9.13 10.20 10.84  9.16  8.66 11.11
## [445]  9.96 10.69 10.73 10.73 12.60 11.21 10.02 11.08 10.97 12.68

Columns from data frames are essentially vectors. We can use all the operations and functions we can use for vectors (depending on their class.)

weather_data$mean_temp[1] # For example, we can select an element of the vector
## [1] 11.14

What if we want to add a new variable? Let’s create a variable named “cold”.

weather_data$cold <- 0

# What does this do?

weather_data$cold
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0

Now, we want to recode “cold” to 1 for cities whose mean temperature is lower than 8 degrees Celsius.

weather_data$cold <- 0

weather_data$cold[weather_data$mean_temp < 8] <- 1

# Let's have a look at both variables:

weather_data[, c("city", "mean_temp", "cold")]
##                                     city mean_temp cold
## 1             Sigmarszell-Zeisertsweiler     11.14    0
## 2                     Obersulm-Willsbach     12.28    0
## 3                               Röllbach     11.37    0
## 4                 Padenstedt (Pony-Park)     10.22    0
## 5                        Elzach-Fisnacht     11.32    0
## 6                       Lippspringe, Bad     11.12    0
## 7                              Ummendorf     10.73    0
## 8                                 Tholey     11.12    0
## 9                 Garmisch-Partenkirchen      9.07    0
## 10                             Veilsdorf      9.83    0
## 11                           Wernigerode     10.97    0
## 12                           Pelzerhaken     10.35    0
## 13                 Balingen-Bronnhaupten     10.49    0
## 14                               Kronach     10.06    0
## 15                            Heckelberg     10.48    0
## 16                 Kaisersbach-Cronhütte     10.09    0
## 17                    Kleiner Inselsberg      7.69    1
## 18                  Starkenberg-Tegkwitz     10.89    0
## 19                            Schwandorf     10.72    0
## 20                             Quickborn     10.39    0
## 21                             Darmstadt     11.70    0
## 22            Staffelstein, Bad-Stublang     10.56    0
## 23                            Geisenheim     12.43    0
## 24                    Rahden-Kleinendorf     11.26    0
## 25                   Heinsberg-Schleiden     12.13    0
## 26                Eichstätt-Landershofen     10.13    0
## 27            Parsberg/Oberpfalz-Eglwang      9.90    0
## 28                           Perl-Nennig     11.71    0
## 29                               Warburg     10.52    0
## 30               Altheim, Kreis Biberach      9.95    0
## 31         Friedrichshafen-Unterraderach     11.55    0
## 32                   Wangerland-Hooksiel     10.94    0
## 33                     Lenzkirch-Ruhbühl      8.83    0
## 34              Neunkirchen-Wellesweiler     11.40    0
## 35                            Boizenburg     10.63    0
## 36                       Leuchtturm Kiel     10.55    0
## 37                   Rosengarten-Klecken     10.51    0
## 38                                Artern     11.19    0
## 39                                 Barth      9.90    0
## 40                    Schlüchtern-Herolz     10.70    0
## 41          Neustadt am Kulm-Filchendorf      9.67    0
## 42                            Düsseldorf     12.31    0
## 43               Freudenberg/Main-Boxtal     11.44    0
## 44                  Weißenburg-Emetzheim     10.69    0
## 45            Querfurt-Mühle Lodersleben     10.69    0
## 46                  Oberhaching-Laufzorn      9.83    0
## 47                            Wusterwitz     11.29    0
## 48                      Königshofen, Bad     10.35    0
## 49                 Ostenfeld (Rendsburg)     10.10    0
## 50                 Wuppertal-Buchenhofen     11.60    0
## 51                            Karlshagen      9.85    0
## 52                               Wolfach     11.38    0
## 53                            Martinroda     10.17    0
## 54                         Oberviechtach      9.51    0
## 55                    Hasenkrug-Hardebek     10.25    0
## 56                           Waldmünchen      9.42    0
## 57                   Schorndorf-Knöbling     10.03    0
## 58                           Blankenrath     10.32    0
## 59                             Birx/Rhön      8.31    0
## 60                                   Aue     10.29    0
## 61                 Kaufbeuren-Oberbeuren      9.50    0
## 62                             Pirmasens     11.41    0
## 63                               Stötten      9.73    0
## 64                               Görlitz     10.79    0
## 65                      Waldems-Reinborn     10.69    0
## 66                           Pfullendorf      9.40    0
## 67               Neubulach-Oberhaugstett     10.08    0
## 68               Kleiner Feldberg/Taunus      7.88    1
## 69                          Trollenhagen     10.26    0
## 70                 Bernburg/Saale (Nord)     11.35    0
## 71                                  Lahr     12.79    0
## 72         Cölbe, Kr. Marburg-Biedenkopf     11.12    0
## 73                 Steinau, Kr. Cuxhaven     10.37    0
## 74                       Lobenstein, Bad      9.04    0
## 75                            Oberstdorf      8.61    0
## 76                             Göttingen     10.71    0
## 77                              Mühldorf     10.48    0
## 78                                 Erfde     10.15    0
## 79                Königswinter-Heiderhof     12.02    0
## 80                           Wasserkuppe      7.26    1
## 81                   Borken in Westfalen     11.72    0
## 82                            Müncheberg     10.60    0
## 83                                Bremen     11.10    0
## 84                    Kiefersfelden-Gach     10.01    0
## 85                               Grambek     10.39    0
## 86               Lichtenhain-Mittelndorf     10.34    0
## 87                         Erfurt-Weimar     10.52    0
## 88            Oberharz am Brocken-Stiege      8.52    0
## 89                      Trier-Petrisberg     11.59    0
## 90                          Kahler Asten      7.12    1
## 91                    Schneifelforsthaus      8.82    0
## 92                              Chieming     10.50    0
## 93                   Moringen-Lutterbeck     10.16    0
## 94                         Stechlin-Menz     10.11    0
## 95                               Kempten      9.75    0
## 96                  Wittstock-Rote Mühle     10.22    0
## 97                          Großenkneten     10.96    0
## 98                              Müllheim     12.55    0
## 99               Möhrendorf-Kleinseebach     11.27    0
## 100                     Landshut-Reithof     10.90    0
## 101                                 Belm     11.14    0
## 102                Klipphausen-Garsebach     10.87    0
## 103                               Grünow     10.29    0
## 104                Michelstadt-Vielbrunn     10.67    0
## 105                              Potsdam     11.14    0
## 106                Weihenstephan-Dürnast     10.39    0
## 107                   Doberlug-Kirchhain     11.03    0
## 108                              Zwiesel      8.85    0
## 109                     Wittingen-Vorhop     10.78    0
## 110           Deutschneudorf-Brüderwiese      7.67    1
## 111                   Sankt Peter-Ording     10.62    0
## 112                              Marnitz     10.37    0
## 113                          Michelstadt     11.67    0
## 114                       Kissingen, Bad     10.78    0
## 115                        Ruppertsecken     10.70    0
## 116                               Plauen     10.04    0
## 117                     Elster, Bad-Sohl      8.79    0
## 118                   Waghäusel-Kirrlach     13.14    0
## 119               Feuchtwangen-Heilbronn      9.99    0
## 120                    Lennestadt-Theten     10.36    0
## 121                   Berlin Brandenburg     11.21    0
## 122                          Muskau, Bad     10.66    0
## 123                        Waltershausen     10.43    0
## 124                            Kahl/Main     12.41    0
## 125                      Geldern-Walbeck     12.09    0
## 126                   Berlin-Dahlem (FU)     11.14    0
## 127                             Mannheim     12.83    0
## 128                             Würzburg     11.66    0
## 129                          Ueckermünde     10.38    0
## 130           Naumburg/Saale-Kreipitzsch     10.80    0
## 131                 Hermaringen-Allewind     10.26    0
## 132                       Aachen-Orsbach     11.41    0
## 133                             Hohwacht     10.25    0
## 134                               Baruth     10.90    0
## 135                 Helmstedt-Emmerstedt     10.90    0
## 136                        Ulm-Mähringen      9.73    0
## 137                             Hannover     11.23    0
## 138                Altomünster-Maisbrunn     10.58    0
## 139                               Eslohe      9.66    0
## 140                        Fritzlar/Eder     10.78    0
## 141                 Feldberg/Mecklenburg      9.89    0
## 142                Leuchtturm Alte Weser     10.98    0
## 143                           Greifswald     10.16    0
## 144                       Idar-Oberstein     10.43    0
## 145                    Krölpa-Rockendorf     10.88    0
## 146              Schwäbisch Gmünd-Weiler     11.24    0
## 147                          Lenzen/Elbe     10.87    0
## 148                            Andernach     12.24    0
## 149                             Tribsees      9.93    0
## 150                              Schleiz      9.73    0
## 151                            Mühlacker     11.37    0
## 152                            Hümmerich     10.85    0
## 153           Dillingen/Donau-Fristingen     10.76    0
## 154                              Dörnick     10.23    0
## 155                  Pforzheim-Ispringen     11.56    0
## 156                               Bochum     12.06    0
## 157                            Braunlage      8.29    0
## 158                               Dörpen     11.23    0
## 159              Amberg-Unterammersricht     10.57    0
## 160                          Sachsenheim     12.17    0
## 161                            Seehausen     11.04    0
## 162                         Großer Arber      4.76    1
## 163                   Lohr/Main-Halsbach     10.73    0
## 164                      Eppingen-Elsenz     11.79    0
## 165                  Oberzent-Beerfelden     10.56    0
## 166                 Reichshof-Eckenhagen     10.69    0
## 167         Neuburg/Kammel-Langenhaslach     10.53    0
## 168                             Schwerin     10.61    0
## 169                     Weimar-Schöndorf     10.76    0
## 170                 Wernigerode-Schierke      7.94    1
## 171         Geringswalde-Altgeringswalde     10.61    0
## 172                  Schmieritz-Weltwitz     10.47    0
## 173                            Helgoland     11.15    0
## 174                         Gottfrieding     10.49    0
## 175                       Kirchdorf/Poel     10.62    0
## 176                                Berus     11.24    0
## 177                            Trostberg     10.64    0
## 178                              Dachwig     11.23    0
## 179                            Metzingen     12.01    0
## 180                           Marienberg      8.71    0
## 181                       Duisburg-Baerl     12.45    0
## 182            Stuttgart (Schnarrenberg)     12.31    0
## 183                            Hechingen     10.79    0
## 184                                 Hahn     10.14    0
## 185                           Reimlingen     10.83    0
## 186 Mallersdorf-Pfaffenberg-Oberlindhart     10.38    0
## 187                        Hersfeld, Bad     10.74    0
## 188                              Itzehoe     10.31    0
## 189                           Merklingen      9.28    0
## 190                   Lübben-Blumenfelde     11.03    0
## 191                 Prackenbach-Neuhäusl      9.46    0
## 192       Kirchberg/Jagst-Herboldshausen     10.60    0
## 193                               Ebrach     10.19    0
## 194                    München-Flughafen     10.56    0
## 195                              Jeßnitz     11.41    0
## 196                        Reit im Winkl      8.67    0
## 197                   Wiesbaden-Auringen     10.92    0
## 198                       Wendisch Evern     10.57    0
## 199                               Wacken     10.33    0
## 200                  Hamburg-Fuhlsbüttel     10.75    0
## 201               Donauwörth-Osterweiler     10.52    0
## 202                          Bremervörde     10.59    0
## 203                           Buchenbach     11.64    0
## 204                 Feldberg/Schwarzwald      5.48    1
## 205                      Köthen (Anhalt)     11.52    0
## 206                 Heinersreuth-Vollhof     10.07    0
## 207                            Zehdenick     10.56    0
## 208                   Gräfenberg-Kasberg     10.15    0
## 209                             Wunstorf     11.62    0
## 210                                Berge     10.98    0
## 211                            Kitzingen     11.85    0
## 212                              Teterow     10.42    0
## 213               Manderscheid-Sonnenhof     10.05    0
## 214                Renningen-Ihinger Hof     10.59    0
## 215                 Twistetal-Mühlhausen     10.28    0
## 216               Bevern, Kr. Holzminden     11.32    0
## 217                            Meiningen      9.71    0
## 218                                Kleve     11.64    0
## 219                        Sohland/Spree      9.43    0
## 220          Langenwetzendorf-Göttendorf     10.10    0
## 221                Weilerswist-Lommersum     11.98    0
## 222                            Osterfeld     11.14    0
## 223                            Zugspitze     -2.90    1
## 224                Friesoythe-Altenoythe     10.99    0
## 225                           Leinefelde     10.09    0
## 226                        Freiburg/Elbe     10.58    0
## 227                    Waltrop-Abdinghof     11.81    0
## 228                       Salzuflen, Bad     11.15    0
## 229               Berka, Bad (Flugplatz)     10.01    0
## 230                           Tönisvorst     12.31    0
## 231                             Chemnitz     10.33    0
## 232                             Goldberg     10.35    0
## 233                             Nürnberg     11.19    0
## 234            Barsinghausen-Hohenbostel     11.35    0
## 235               Berleburg, Bad-Stünzel      8.57    0
## 236                            Norderney     11.18    0
## 237                          Kall-Sistig      9.70    0
## 238                          Lüdenscheid     10.11    0
## 239                           Klippeneck      8.93    0
## 240                         Braunschweig     11.22    0
## 241                  Saarbrücken-Burbach     12.32    0
## 242                         Alsfeld-Eifa     10.30    0
## 243                          Schwarzburg     10.34    0
## 244                           Wiesenburg     10.65    0
## 245                        Leipzig/Halle     11.31    0
## 246                         Nauheim, Bad     11.96    0
## 247                        Harzburg, Bad     11.04    0
## 248                   Grambow-Schwennenz     10.22    0
## 249                               Uelzen     10.64    0
## 250                     Amerang-Pfaffing     10.24    0
## 251                          Holzkirchen     10.03    0
## 252               Villingen-Schwenningen      9.40    0
## 253                              Bamberg     10.92    0
## 254                           Wittenberg     11.24    0
## 255                      Möckern-Drewitz     11.08    0
## 256                              Fehmarn     10.46    0
## 257                 Lippstadt-Bökenförde     11.69    0
## 258                               Singen     11.16    0
## 259                               Arkona      9.93    0
## 260              Meinerzhagen-Redlendorf      9.89    0
## 261                             Freiburg     12.94    0
## 262               Gevelsberg-Oberbröking     11.19    0
## 263                         Lichtentanne     10.58    0
## 264             Schauenburg-Elgershausen     10.30    0
## 265                  Schonungen-Mainberg     11.08    0
## 266                 Lautertal-Oberlauter     10.34    0
## 267              Pommelsbrunn-Mittelburg      9.96    0
## 268                           Deuselbach     10.49    0
## 269                             Herzberg     10.36    0
## 270                Aldersbach-Kramersepp     10.66    0
## 271                           Lindenberg     11.05    0
## 272                            Gelbelsee     10.19    0
## 273                            Alfhausen     11.06    0
## 274                             Augsburg     10.47    0
## 275                               Metten     10.25    0
## 276                               Kyritz     10.57    0
## 277                           Gardelegen     10.96    0
## 278                             Eschwege     11.05    0
## 279                       Frankfurt/Main     12.46    0
## 280                            Memmingen      9.97    0
## 281                Klitzschen bei Torgau     11.38    0
## 282                            Straubing     10.63    0
## 283                  Wolfsburg (Südwest)     11.14    0
## 284                  Burgwald-Bottendorf     10.23    0
## 285               Kubschütz, Kr. Bautzen     11.13    0
## 286                            Notzingen     11.38    0
## 287                       Essen-Bredeney     11.83    0
## 288             Saldenburg-Entschenreuth      9.89    0
## 289                                Worms     12.41    0
## 290                               Bassum     11.08    0
## 291                            Manschnow     10.90    0
## 292             Neukirchen-Hauptschwenda      9.50    0
## 293                            Schleswig     10.07    0
## 294                    Jena (Sternwarte)     11.67    0
## 295                            Waibstadt     11.82    0
## 296             Oy-Mittelberg-Petersthal      8.96    0
## 297                    Lübeck-Blankensee     10.38    0
## 298     Neunkirchen-Seelscheid-Krawinkel     11.50    0
## 299                 Neuruppin-Alt Ruppin     10.54    0
## 300                               Seesen     10.72    0
## 301                 Zinnwald-Georgenfeld      6.88    1
## 302            Mittelnkirchen-Hohenfelde     10.66    0
## 303              Tirschenreuth-Lodermühl      9.29    0
## 304                               Soltau     10.58    0
## 305                               Piding     10.26    0
## 306                Emmendingen-Mundingen     12.27    0
## 307                            Hattstedt     10.23    0
## 308                          Berlin-Buch     10.99    0
## 309                 Ellwangen-Rindelbach     10.52    0
## 310                              Genthin     11.10    0
## 311                               Putbus      9.80    0
## 312                        München-Stadt     11.57    0
## 313     Salzungen, Bad-Gräfen-Nitzendorf     10.44    0
## 314                       Langenlipsdorf     10.82    0
## 315               Aschersleben-Mehringen     11.13    0
## 316             Rothenburg ob der Tauber     10.87    0
## 317                   Holzdorf-Bernsdorf     11.18    0
## 318               Schönhagen (Ostseebad)     10.16    0
## 319                             Sandberg     10.03    0
## 320                Leutkirch-Herlazhofen      9.46    0
## 321                      Grainet-Rehberg      9.28    0
## 322                          Simbach/Inn     10.89    0
## 323                             Ohlsbach     12.83    0
## 324                    Bertsdorf-Hörnitz     10.05    0
## 325                Worpswede-Hüttenbusch     10.72    0
## 326               Schaafheim-Schlierbach     12.00    0
## 327                        Gera-Leumnitz     10.59    0
## 328                 Offenbach-Wetterpark     12.15    0
## 329                             Günzburg     10.42    0
## 330                   Dresden-Hosterwitz     11.68    0
## 331                             Cuxhaven     11.07    0
## 332                   Neuhaus am Rennweg      7.30    1
## 333                    Hameln-Hastenbeck     11.32    0
## 334                     Borkum-Flugplatz     11.17    0
## 335                   Rostock-Warnemünde     10.85    0
## 336                    Steinhagen-Negast      9.84    0
## 337                             Weinbiet     10.39    0
## 338                                Emden     10.99    0
## 339                     Nideggen-Schmidt     10.93    0
## 340                 Schönwald/Ofr.-Brunn      8.73    0
## 341                      Runkel-Ennerich     11.31    0
## 342                   Leipzig-Holzhausen     11.44    0
## 343    Fichtelberg/Oberfranken-Hüttstadl      8.45    0
## 344                          Quedlinburg     11.41    0
## 345                               Sontra     10.30    0
## 346                          Fürstenzell     10.38    0
## 347               Münsingen-Apfelstetten      9.19    0
## 348                               Treuen      9.88    0
## 349                       Siegsdorf-Höll      9.81    0
## 350                       Kaiserslautern     11.69    0
## 351                        Kiel-Holtenau     10.50    0
## 352                           Harzgerode      9.26    0
## 353                         Elpersbüttel     10.39    0
## 354               Frankfurt/Main-Westend     12.68    0
## 355                      Lügde-Paenbruch     10.19    0
## 356                               Nossen     10.85    0
## 357                            Carlsfeld      6.82    1
## 358                               Anklam     10.23    0
## 359                    Ebersberg-Halbing     10.38    0
## 360                               Lüchow     10.94    0
## 361             Markt Erlbach-Hagenhofen     10.28    0
## 362                                  Hof      9.28    0
## 363                              Olsdorf     10.81    0
## 364                          Trier-Zewen     12.30    0
## 365                            Tann/Rhön     10.19    0
## 366                  Saarbrücken-Ensheim     11.35    0
## 367                Baden-Baden-Geroldsau     12.03    0
## 368                               Weiden     10.09    0
## 369                   Arnstein-Müdesheim     10.97    0
## 370                 Bielefeld-Deppendorf     10.97    0
## 371                        Lingen-Baccum     11.40    0
## 372                        Dürkheim, Bad     12.67    0
## 373                        Groß Lüsewitz     10.02    0
## 374              Neuenahr, Bad-Ahrweiler     12.16    0
## 375          Mühlhausen/Thüringen-Görmar     10.64    0
## 376                     Sigmaringen-Laiz      9.65    0
## 377                          Olbersleben     11.02    0
## 378                           Hilgenroth     10.91    0
## 379                           Angermünde     10.49    0
## 380                              Brocken      4.91    1
## 381                             Rottweil     10.08    0
## 382                             Diepholz     11.19    0
## 383                              Harburg     10.58    0
## 384                    Elsendorf-Horneck     10.74    0
## 385                Hamburg-Neuwiedenthal     11.18    0
## 386                  Ingolstadt-Manching     10.89    0
## 387               Lüdinghausen-Brochtrup     11.76    0
## 388                            Magdeburg     11.69    0
## 389          Buchen, Kr. Neckar-Odenwald     10.61    0
## 390                           Wittenborn     10.26    0
## 391                       Maisach-Galgen     10.46    0
## 392                             Konstanz     11.52    0
## 393                Dachsberg-Wolpadingen      9.45    0
## 394                                 Leck      9.96    0
## 395                      Arnsberg-Neheim     10.70    0
## 396                      Schleswig-Jagel      9.97    0
## 397                             Attenkam     10.16    0
## 398                          Hoyerswerda     11.12    0
## 399                              Cottbus     11.32    0
## 400                          Boltenhagen     10.31    0
## 401                       Köln-Stammheim     12.80    0
## 402                 Löhnberg-Obershausen     10.56    0
## 403              Dippoldiswalde-Reinberg      9.91    0
## 404                      Bergzabern, Bad     12.71    0
## 405                     Simmern-Wahlbach     10.28    0
## 406              Wutöschingen-Ofteringen     10.94    0
## 407                             Öhringen     12.11    0
## 408                          Fulda-Horas     10.57    0
## 409                                 Werl     11.82    0
## 410                Ennigerloh-Ostenfelde     11.42    0
## 411                               Alfeld     10.78    0
## 412                     Greifswalder Oie     10.36    0
## 413                  Meßstetten-Appental      8.21    0
## 414                                 Roth     10.72    0
## 415                      Hiddensee-Vitte     10.40    0
## 416                      Neu-Ulrichstein     10.26    0
## 417         Weidenbach-Weiherschneidbach     10.19    0
## 418                       Waren (Müritz)     10.45    0
## 419                    Münster/Osnabrück     11.62    0
## 420                             Nienburg     11.20    0
## 421             Falkenberg,Kr.Rottal-Inn     10.73    0
## 422                          Groß Berßen     11.20    0
## 423                          Rheinfelden     12.33    0
## 424                          Fichtelberg      5.21    1
## 425                      Lauchstädt, Bad     11.32    0
## 426                            Köln/Bonn     12.02    0
## 427              Wielenbach (Demollstr.)     10.33    0
## 428                 Neuburg an der Donau     10.73    0
## 429                                Ahaus     11.50    0
## 430                    Rotenburg (Wümme)     10.79    0
## 431                            Rosenheim     10.83    0
## 432                              Oschatz     11.23    0
## 433                             Eisenach     10.45    0
## 434                 Wunsiedel-Schönbrunn      8.92    0
## 435            Ingelfingen-Stachenhausen     11.07    0
## 436                     Berlin-Tempelhof     11.55    0
## 437                           Regensburg     10.82    0
## 438                     Weiskirchen/Saar     11.19    0
## 439                      Hohenpeißenberg      9.13    0
## 440                      Laage-Kronskamp     10.20    0
## 441                   Schipkau-Klettwitz     10.84    0
## 442                         Freudenstadt      9.16    0
## 443                           Teuschnitz      8.66    0
## 444                               Demker     11.11    0
## 445                    Nürburg-Barweiler      9.96    0
## 446                              Coschen     10.69    0
## 447              Großerlach-Mannenweiler     10.73    0
## 448                             Kösching     10.73    0
## 449              Rheinau-Memprechtshofen     12.60    0
## 450                    Dresden-Klotzsche     11.21    0
## 451                            Geisingen     10.02    0
## 452                                Zeitz     11.08    0
## 453           Weingarten, Kr. Ravensburg     10.97    0
## 454                         Rheinstetten     12.68    0

Calculating Measures of Central Tendency and Variability

Let’s look at the Measures of Central Tendency and Variability from the lecture (starting at slide 17).

Consider the following vector:

example_vec <- c(1, 2, 3, 4, 5)

How could we calculate the mean of example_vec?

We could simply calculate it “by hand”:

(1 + 2 + 3 + 4 + 5) / 5
## [1] 3

But this is not very useful if we look at an actual vector in our data frame, e.g., mean temperature:

weather_data$mean_temp
##   [1] 11.14 12.28 11.37 10.22 11.32 11.12 10.73 11.12  9.07  9.83 10.97 10.35
##  [13] 10.49 10.06 10.48 10.09  7.69 10.89 10.72 10.39 11.70 10.56 12.43 11.26
##  [25] 12.13 10.13  9.90 11.71 10.52  9.95 11.55 10.94  8.83 11.40 10.63 10.55
##  [37] 10.51 11.19  9.90 10.70  9.67 12.31 11.44 10.69 10.69  9.83 11.29 10.35
##  [49] 10.10 11.60  9.85 11.38 10.17  9.51 10.25  9.42 10.03 10.32  8.31 10.29
##  [61]  9.50 11.41  9.73 10.79 10.69  9.40 10.08  7.88 10.26 11.35 12.79 11.12
##  [73] 10.37  9.04  8.61 10.71 10.48 10.15 12.02  7.26 11.72 10.60 11.10 10.01
##  [85] 10.39 10.34 10.52  8.52 11.59  7.12  8.82 10.50 10.16 10.11  9.75 10.22
##  [97] 10.96 12.55 11.27 10.90 11.14 10.87 10.29 10.67 11.14 10.39 11.03  8.85
## [109] 10.78  7.67 10.62 10.37 11.67 10.78 10.70 10.04  8.79 13.14  9.99 10.36
## [121] 11.21 10.66 10.43 12.41 12.09 11.14 12.83 11.66 10.38 10.80 10.26 11.41
## [133] 10.25 10.90 10.90  9.73 11.23 10.58  9.66 10.78  9.89 10.98 10.16 10.43
## [145] 10.88 11.24 10.87 12.24  9.93  9.73 11.37 10.85 10.76 10.23 11.56 12.06
## [157]  8.29 11.23 10.57 12.17 11.04  4.76 10.73 11.79 10.56 10.69 10.53 10.61
## [169] 10.76  7.94 10.61 10.47 11.15 10.49 10.62 11.24 10.64 11.23 12.01  8.71
## [181] 12.45 12.31 10.79 10.14 10.83 10.38 10.74 10.31  9.28 11.03  9.46 10.60
## [193] 10.19 10.56 11.41  8.67 10.92 10.57 10.33 10.75 10.52 10.59 11.64  5.48
## [205] 11.52 10.07 10.56 10.15 11.62 10.98 11.85 10.42 10.05 10.59 10.28 11.32
## [217]  9.71 11.64  9.43 10.10 11.98 11.14 -2.90 10.99 10.09 10.58 11.81 11.15
## [229] 10.01 12.31 10.33 10.35 11.19 11.35  8.57 11.18  9.70 10.11  8.93 11.22
## [241] 12.32 10.30 10.34 10.65 11.31 11.96 11.04 10.22 10.64 10.24 10.03  9.40
## [253] 10.92 11.24 11.08 10.46 11.69 11.16  9.93  9.89 12.94 11.19 10.58 10.30
## [265] 11.08 10.34  9.96 10.49 10.36 10.66 11.05 10.19 11.06 10.47 10.25 10.57
## [277] 10.96 11.05 12.46  9.97 11.38 10.63 11.14 10.23 11.13 11.38 11.83  9.89
## [289] 12.41 11.08 10.90  9.50 10.07 11.67 11.82  8.96 10.38 11.50 10.54 10.72
## [301]  6.88 10.66  9.29 10.58 10.26 12.27 10.23 10.99 10.52 11.10  9.80 11.57
## [313] 10.44 10.82 11.13 10.87 11.18 10.16 10.03  9.46  9.28 10.89 12.83 10.05
## [325] 10.72 12.00 10.59 12.15 10.42 11.68 11.07  7.30 11.32 11.17 10.85  9.84
## [337] 10.39 10.99 10.93  8.73 11.31 11.44  8.45 11.41 10.30 10.38  9.19  9.88
## [349]  9.81 11.69 10.50  9.26 10.39 12.68 10.19 10.85  6.82 10.23 10.38 10.94
## [361] 10.28  9.28 10.81 12.30 10.19 11.35 12.03 10.09 10.97 10.97 11.40 12.67
## [373] 10.02 12.16 10.64  9.65 11.02 10.91 10.49  4.91 10.08 11.19 10.58 10.74
## [385] 11.18 10.89 11.76 11.69 10.61 10.26 10.46 11.52  9.45  9.96 10.70  9.97
## [397] 10.16 11.12 11.32 10.31 12.80 10.56  9.91 12.71 10.28 10.94 12.11 10.57
## [409] 11.82 11.42 10.78 10.36  8.21 10.72 10.40 10.26 10.19 10.45 11.62 11.20
## [421] 10.73 11.20 12.33  5.21 11.32 12.02 10.33 10.73 11.50 10.79 10.83 11.23
## [433] 10.45  8.92 11.07 11.55 10.82 11.19  9.13 10.20 10.84  9.16  8.66 11.11
## [445]  9.96 10.69 10.73 10.73 12.60 11.21 10.02 11.08 10.97 12.68

Typing up all the entries individually would take a lot of time. We could use two functions that we already have seen, sum and length.

sum(weather_data$mean_temp) / length(weather_data$mean_temp)
## [1] 10.56586

Fortunately, R provides a much easier way to calculate a mean:

mean(weather_data$mean_temp) # That was easy.
## [1] 10.56586

But be sure that your vector is numeric. Could you calculate the mean of city?

weather_data$city
##   [1] "Sigmarszell-Zeisertsweiler"          
##   [2] "Obersulm-Willsbach"                  
##   [3] "Röllbach"                            
##   [4] "Padenstedt (Pony-Park)"              
##   [5] "Elzach-Fisnacht"                     
##   [6] "Lippspringe, Bad"                    
##   [7] "Ummendorf"                           
##   [8] "Tholey"                              
##   [9] "Garmisch-Partenkirchen"              
##  [10] "Veilsdorf"                           
##  [11] "Wernigerode"                         
##  [12] "Pelzerhaken"                         
##  [13] "Balingen-Bronnhaupten"               
##  [14] "Kronach"                             
##  [15] "Heckelberg"                          
##  [16] "Kaisersbach-Cronhütte"               
##  [17] "Kleiner Inselsberg"                  
##  [18] "Starkenberg-Tegkwitz"                
##  [19] "Schwandorf"                          
##  [20] "Quickborn"                           
##  [21] "Darmstadt"                           
##  [22] "Staffelstein, Bad-Stublang"          
##  [23] "Geisenheim"                          
##  [24] "Rahden-Kleinendorf"                  
##  [25] "Heinsberg-Schleiden"                 
##  [26] "Eichstätt-Landershofen"              
##  [27] "Parsberg/Oberpfalz-Eglwang"          
##  [28] "Perl-Nennig"                         
##  [29] "Warburg"                             
##  [30] "Altheim, Kreis Biberach"             
##  [31] "Friedrichshafen-Unterraderach"       
##  [32] "Wangerland-Hooksiel"                 
##  [33] "Lenzkirch-Ruhbühl"                   
##  [34] "Neunkirchen-Wellesweiler"            
##  [35] "Boizenburg"                          
##  [36] "Leuchtturm Kiel"                     
##  [37] "Rosengarten-Klecken"                 
##  [38] "Artern"                              
##  [39] "Barth"                               
##  [40] "Schlüchtern-Herolz"                  
##  [41] "Neustadt am Kulm-Filchendorf"        
##  [42] "Düsseldorf"                          
##  [43] "Freudenberg/Main-Boxtal"             
##  [44] "Weißenburg-Emetzheim"                
##  [45] "Querfurt-Mühle Lodersleben"          
##  [46] "Oberhaching-Laufzorn"                
##  [47] "Wusterwitz"                          
##  [48] "Königshofen, Bad"                    
##  [49] "Ostenfeld (Rendsburg)"               
##  [50] "Wuppertal-Buchenhofen"               
##  [51] "Karlshagen"                          
##  [52] "Wolfach"                             
##  [53] "Martinroda"                          
##  [54] "Oberviechtach"                       
##  [55] "Hasenkrug-Hardebek"                  
##  [56] "Waldmünchen"                         
##  [57] "Schorndorf-Knöbling"                 
##  [58] "Blankenrath"                         
##  [59] "Birx/Rhön"                           
##  [60] "Aue"                                 
##  [61] "Kaufbeuren-Oberbeuren"               
##  [62] "Pirmasens"                           
##  [63] "Stötten"                             
##  [64] "Görlitz"                             
##  [65] "Waldems-Reinborn"                    
##  [66] "Pfullendorf"                         
##  [67] "Neubulach-Oberhaugstett"             
##  [68] "Kleiner Feldberg/Taunus"             
##  [69] "Trollenhagen"                        
##  [70] "Bernburg/Saale (Nord)"               
##  [71] "Lahr"                                
##  [72] "Cölbe, Kr. Marburg-Biedenkopf"       
##  [73] "Steinau, Kr. Cuxhaven"               
##  [74] "Lobenstein, Bad"                     
##  [75] "Oberstdorf"                          
##  [76] "Göttingen"                           
##  [77] "Mühldorf"                            
##  [78] "Erfde"                               
##  [79] "Königswinter-Heiderhof"              
##  [80] "Wasserkuppe"                         
##  [81] "Borken in Westfalen"                 
##  [82] "Müncheberg"                          
##  [83] "Bremen"                              
##  [84] "Kiefersfelden-Gach"                  
##  [85] "Grambek"                             
##  [86] "Lichtenhain-Mittelndorf"             
##  [87] "Erfurt-Weimar"                       
##  [88] "Oberharz am Brocken-Stiege"          
##  [89] "Trier-Petrisberg"                    
##  [90] "Kahler Asten"                        
##  [91] "Schneifelforsthaus"                  
##  [92] "Chieming"                            
##  [93] "Moringen-Lutterbeck"                 
##  [94] "Stechlin-Menz"                       
##  [95] "Kempten"                             
##  [96] "Wittstock-Rote Mühle"                
##  [97] "Großenkneten"                        
##  [98] "Müllheim"                            
##  [99] "Möhrendorf-Kleinseebach"             
## [100] "Landshut-Reithof"                    
## [101] "Belm"                                
## [102] "Klipphausen-Garsebach"               
## [103] "Grünow"                              
## [104] "Michelstadt-Vielbrunn"               
## [105] "Potsdam"                             
## [106] "Weihenstephan-Dürnast"               
## [107] "Doberlug-Kirchhain"                  
## [108] "Zwiesel"                             
## [109] "Wittingen-Vorhop"                    
## [110] "Deutschneudorf-Brüderwiese"          
## [111] "Sankt Peter-Ording"                  
## [112] "Marnitz"                             
## [113] "Michelstadt"                         
## [114] "Kissingen, Bad"                      
## [115] "Ruppertsecken"                       
## [116] "Plauen"                              
## [117] "Elster, Bad-Sohl"                    
## [118] "Waghäusel-Kirrlach"                  
## [119] "Feuchtwangen-Heilbronn"              
## [120] "Lennestadt-Theten"                   
## [121] "Berlin Brandenburg"                  
## [122] "Muskau, Bad"                         
## [123] "Waltershausen"                       
## [124] "Kahl/Main"                           
## [125] "Geldern-Walbeck"                     
## [126] "Berlin-Dahlem (FU)"                  
## [127] "Mannheim"                            
## [128] "Würzburg"                            
## [129] "Ueckermünde"                         
## [130] "Naumburg/Saale-Kreipitzsch"          
## [131] "Hermaringen-Allewind"                
## [132] "Aachen-Orsbach"                      
## [133] "Hohwacht"                            
## [134] "Baruth"                              
## [135] "Helmstedt-Emmerstedt"                
## [136] "Ulm-Mähringen"                       
## [137] "Hannover"                            
## [138] "Altomünster-Maisbrunn"               
## [139] "Eslohe"                              
## [140] "Fritzlar/Eder"                       
## [141] "Feldberg/Mecklenburg"                
## [142] "Leuchtturm Alte Weser"               
## [143] "Greifswald"                          
## [144] "Idar-Oberstein"                      
## [145] "Krölpa-Rockendorf"                   
## [146] "Schwäbisch Gmünd-Weiler"             
## [147] "Lenzen/Elbe"                         
## [148] "Andernach"                           
## [149] "Tribsees"                            
## [150] "Schleiz"                             
## [151] "Mühlacker"                           
## [152] "Hümmerich"                           
## [153] "Dillingen/Donau-Fristingen"          
## [154] "Dörnick"                             
## [155] "Pforzheim-Ispringen"                 
## [156] "Bochum"                              
## [157] "Braunlage"                           
## [158] "Dörpen"                              
## [159] "Amberg-Unterammersricht"             
## [160] "Sachsenheim"                         
## [161] "Seehausen"                           
## [162] "Großer Arber"                        
## [163] "Lohr/Main-Halsbach"                  
## [164] "Eppingen-Elsenz"                     
## [165] "Oberzent-Beerfelden"                 
## [166] "Reichshof-Eckenhagen"                
## [167] "Neuburg/Kammel-Langenhaslach"        
## [168] "Schwerin"                            
## [169] "Weimar-Schöndorf"                    
## [170] "Wernigerode-Schierke"                
## [171] "Geringswalde-Altgeringswalde"        
## [172] "Schmieritz-Weltwitz"                 
## [173] "Helgoland"                           
## [174] "Gottfrieding"                        
## [175] "Kirchdorf/Poel"                      
## [176] "Berus"                               
## [177] "Trostberg"                           
## [178] "Dachwig"                             
## [179] "Metzingen"                           
## [180] "Marienberg"                          
## [181] "Duisburg-Baerl"                      
## [182] "Stuttgart (Schnarrenberg)"           
## [183] "Hechingen"                           
## [184] "Hahn"                                
## [185] "Reimlingen"                          
## [186] "Mallersdorf-Pfaffenberg-Oberlindhart"
## [187] "Hersfeld, Bad"                       
## [188] "Itzehoe"                             
## [189] "Merklingen"                          
## [190] "Lübben-Blumenfelde"                  
## [191] "Prackenbach-Neuhäusl"                
## [192] "Kirchberg/Jagst-Herboldshausen"      
## [193] "Ebrach"                              
## [194] "München-Flughafen"                   
## [195] "Jeßnitz"                             
## [196] "Reit im Winkl"                       
## [197] "Wiesbaden-Auringen"                  
## [198] "Wendisch Evern"                      
## [199] "Wacken"                              
## [200] "Hamburg-Fuhlsbüttel"                 
## [201] "Donauwörth-Osterweiler"              
## [202] "Bremervörde"                         
## [203] "Buchenbach"                          
## [204] "Feldberg/Schwarzwald"                
## [205] "Köthen (Anhalt)"                     
## [206] "Heinersreuth-Vollhof"                
## [207] "Zehdenick"                           
## [208] "Gräfenberg-Kasberg"                  
## [209] "Wunstorf"                            
## [210] "Berge"                               
## [211] "Kitzingen"                           
## [212] "Teterow"                             
## [213] "Manderscheid-Sonnenhof"              
## [214] "Renningen-Ihinger Hof"               
## [215] "Twistetal-Mühlhausen"                
## [216] "Bevern, Kr. Holzminden"              
## [217] "Meiningen"                           
## [218] "Kleve"                               
## [219] "Sohland/Spree"                       
## [220] "Langenwetzendorf-Göttendorf"         
## [221] "Weilerswist-Lommersum"               
## [222] "Osterfeld"                           
## [223] "Zugspitze"                           
## [224] "Friesoythe-Altenoythe"               
## [225] "Leinefelde"                          
## [226] "Freiburg/Elbe"                       
## [227] "Waltrop-Abdinghof"                   
## [228] "Salzuflen, Bad"                      
## [229] "Berka, Bad (Flugplatz)"              
## [230] "Tönisvorst"                          
## [231] "Chemnitz"                            
## [232] "Goldberg"                            
## [233] "Nürnberg"                            
## [234] "Barsinghausen-Hohenbostel"           
## [235] "Berleburg, Bad-Stünzel"              
## [236] "Norderney"                           
## [237] "Kall-Sistig"                         
## [238] "Lüdenscheid"                         
## [239] "Klippeneck"                          
## [240] "Braunschweig"                        
## [241] "Saarbrücken-Burbach"                 
## [242] "Alsfeld-Eifa"                        
## [243] "Schwarzburg"                         
## [244] "Wiesenburg"                          
## [245] "Leipzig/Halle"                       
## [246] "Nauheim, Bad"                        
## [247] "Harzburg, Bad"                       
## [248] "Grambow-Schwennenz"                  
## [249] "Uelzen"                              
## [250] "Amerang-Pfaffing"                    
## [251] "Holzkirchen"                         
## [252] "Villingen-Schwenningen"              
## [253] "Bamberg"                             
## [254] "Wittenberg"                          
## [255] "Möckern-Drewitz"                     
## [256] "Fehmarn"                             
## [257] "Lippstadt-Bökenförde"                
## [258] "Singen"                              
## [259] "Arkona"                              
## [260] "Meinerzhagen-Redlendorf"             
## [261] "Freiburg"                            
## [262] "Gevelsberg-Oberbröking"              
## [263] "Lichtentanne"                        
## [264] "Schauenburg-Elgershausen"            
## [265] "Schonungen-Mainberg"                 
## [266] "Lautertal-Oberlauter"                
## [267] "Pommelsbrunn-Mittelburg"             
## [268] "Deuselbach"                          
## [269] "Herzberg"                            
## [270] "Aldersbach-Kramersepp"               
## [271] "Lindenberg"                          
## [272] "Gelbelsee"                           
## [273] "Alfhausen"                           
## [274] "Augsburg"                            
## [275] "Metten"                              
## [276] "Kyritz"                              
## [277] "Gardelegen"                          
## [278] "Eschwege"                            
## [279] "Frankfurt/Main"                      
## [280] "Memmingen"                           
## [281] "Klitzschen bei Torgau"               
## [282] "Straubing"                           
## [283] "Wolfsburg (Südwest)"                 
## [284] "Burgwald-Bottendorf"                 
## [285] "Kubschütz, Kr. Bautzen"              
## [286] "Notzingen"                           
## [287] "Essen-Bredeney"                      
## [288] "Saldenburg-Entschenreuth"            
## [289] "Worms"                               
## [290] "Bassum"                              
## [291] "Manschnow"                           
## [292] "Neukirchen-Hauptschwenda"            
## [293] "Schleswig"                           
## [294] "Jena (Sternwarte)"                   
## [295] "Waibstadt"                           
## [296] "Oy-Mittelberg-Petersthal"            
## [297] "Lübeck-Blankensee"                   
## [298] "Neunkirchen-Seelscheid-Krawinkel"    
## [299] "Neuruppin-Alt Ruppin"                
## [300] "Seesen"                              
## [301] "Zinnwald-Georgenfeld"                
## [302] "Mittelnkirchen-Hohenfelde"           
## [303] "Tirschenreuth-Lodermühl"             
## [304] "Soltau"                              
## [305] "Piding"                              
## [306] "Emmendingen-Mundingen"               
## [307] "Hattstedt"                           
## [308] "Berlin-Buch"                         
## [309] "Ellwangen-Rindelbach"                
## [310] "Genthin"                             
## [311] "Putbus"                              
## [312] "München-Stadt"                       
## [313] "Salzungen, Bad-Gräfen-Nitzendorf"    
## [314] "Langenlipsdorf"                      
## [315] "Aschersleben-Mehringen"              
## [316] "Rothenburg ob der Tauber"            
## [317] "Holzdorf-Bernsdorf"                  
## [318] "Schönhagen (Ostseebad)"              
## [319] "Sandberg"                            
## [320] "Leutkirch-Herlazhofen"               
## [321] "Grainet-Rehberg"                     
## [322] "Simbach/Inn"                         
## [323] "Ohlsbach"                            
## [324] "Bertsdorf-Hörnitz"                   
## [325] "Worpswede-Hüttenbusch"               
## [326] "Schaafheim-Schlierbach"              
## [327] "Gera-Leumnitz"                       
## [328] "Offenbach-Wetterpark"                
## [329] "Günzburg"                            
## [330] "Dresden-Hosterwitz"                  
## [331] "Cuxhaven"                            
## [332] "Neuhaus am Rennweg"                  
## [333] "Hameln-Hastenbeck"                   
## [334] "Borkum-Flugplatz"                    
## [335] "Rostock-Warnemünde"                  
## [336] "Steinhagen-Negast"                   
## [337] "Weinbiet"                            
## [338] "Emden"                               
## [339] "Nideggen-Schmidt"                    
## [340] "Schönwald/Ofr.-Brunn"                
## [341] "Runkel-Ennerich"                     
## [342] "Leipzig-Holzhausen"                  
## [343] "Fichtelberg/Oberfranken-Hüttstadl"   
## [344] "Quedlinburg"                         
## [345] "Sontra"                              
## [346] "Fürstenzell"                         
## [347] "Münsingen-Apfelstetten"              
## [348] "Treuen"                              
## [349] "Siegsdorf-Höll"                      
## [350] "Kaiserslautern"                      
## [351] "Kiel-Holtenau"                       
## [352] "Harzgerode"                          
## [353] "Elpersbüttel"                        
## [354] "Frankfurt/Main-Westend"              
## [355] "Lügde-Paenbruch"                     
## [356] "Nossen"                              
## [357] "Carlsfeld"                           
## [358] "Anklam"                              
## [359] "Ebersberg-Halbing"                   
## [360] "Lüchow"                              
## [361] "Markt Erlbach-Hagenhofen"            
## [362] "Hof"                                 
## [363] "Olsdorf"                             
## [364] "Trier-Zewen"                         
## [365] "Tann/Rhön"                           
## [366] "Saarbrücken-Ensheim"                 
## [367] "Baden-Baden-Geroldsau"               
## [368] "Weiden"                              
## [369] "Arnstein-Müdesheim"                  
## [370] "Bielefeld-Deppendorf"                
## [371] "Lingen-Baccum"                       
## [372] "Dürkheim, Bad"                       
## [373] "Groß Lüsewitz"                       
## [374] "Neuenahr, Bad-Ahrweiler"             
## [375] "Mühlhausen/Thüringen-Görmar"         
## [376] "Sigmaringen-Laiz"                    
## [377] "Olbersleben"                         
## [378] "Hilgenroth"                          
## [379] "Angermünde"                          
## [380] "Brocken"                             
## [381] "Rottweil"                            
## [382] "Diepholz"                            
## [383] "Harburg"                             
## [384] "Elsendorf-Horneck"                   
## [385] "Hamburg-Neuwiedenthal"               
## [386] "Ingolstadt-Manching"                 
## [387] "Lüdinghausen-Brochtrup"              
## [388] "Magdeburg"                           
## [389] "Buchen, Kr. Neckar-Odenwald"         
## [390] "Wittenborn"                          
## [391] "Maisach-Galgen"                      
## [392] "Konstanz"                            
## [393] "Dachsberg-Wolpadingen"               
## [394] "Leck"                                
## [395] "Arnsberg-Neheim"                     
## [396] "Schleswig-Jagel"                     
## [397] "Attenkam"                            
## [398] "Hoyerswerda"                         
## [399] "Cottbus"                             
## [400] "Boltenhagen"                         
## [401] "Köln-Stammheim"                      
## [402] "Löhnberg-Obershausen"                
## [403] "Dippoldiswalde-Reinberg"             
## [404] "Bergzabern, Bad"                     
## [405] "Simmern-Wahlbach"                    
## [406] "Wutöschingen-Ofteringen"             
## [407] "Öhringen"                            
## [408] "Fulda-Horas"                         
## [409] "Werl"                                
## [410] "Ennigerloh-Ostenfelde"               
## [411] "Alfeld"                              
## [412] "Greifswalder Oie"                    
## [413] "Meßstetten-Appental"                 
## [414] "Roth"                                
## [415] "Hiddensee-Vitte"                     
## [416] "Neu-Ulrichstein"                     
## [417] "Weidenbach-Weiherschneidbach"        
## [418] "Waren (Müritz)"                      
## [419] "Münster/Osnabrück"                   
## [420] "Nienburg"                            
## [421] "Falkenberg,Kr.Rottal-Inn"            
## [422] "Groß Berßen"                         
## [423] "Rheinfelden"                         
## [424] "Fichtelberg"                         
## [425] "Lauchstädt, Bad"                     
## [426] "Köln/Bonn"                           
## [427] "Wielenbach (Demollstr.)"             
## [428] "Neuburg an der Donau"                
## [429] "Ahaus"                               
## [430] "Rotenburg (Wümme)"                   
## [431] "Rosenheim"                           
## [432] "Oschatz"                             
## [433] "Eisenach"                            
## [434] "Wunsiedel-Schönbrunn"                
## [435] "Ingelfingen-Stachenhausen"           
## [436] "Berlin-Tempelhof"                    
## [437] "Regensburg"                          
## [438] "Weiskirchen/Saar"                    
## [439] "Hohenpeißenberg"                     
## [440] "Laage-Kronskamp"                     
## [441] "Schipkau-Klettwitz"                  
## [442] "Freudenstadt"                        
## [443] "Teuschnitz"                          
## [444] "Demker"                              
## [445] "Nürburg-Barweiler"                   
## [446] "Coschen"                             
## [447] "Großerlach-Mannenweiler"             
## [448] "Kösching"                            
## [449] "Rheinau-Memprechtshofen"             
## [450] "Dresden-Klotzsche"                   
## [451] "Geisingen"                           
## [452] "Zeitz"                               
## [453] "Weingarten, Kr. Ravensburg"          
## [454] "Rheinstetten"

Let’s try to calculate the mean.

mean(weather_data$city)
## Warning in mean.default(weather_data$city): argument is not numeric or logical:
## returning NA
## [1] NA

It does not work! And even by hand we could not calculate the mean of character valued vectors.

Here is an overview over functions for measures of centrality and variability:

  • Mean: mean()
  • Median: median()
  • Variance: var()
  • Standard Deviation: sd()
  • Range: range()
  • Inter-quartile range: IQR()

You can try them out here:

# Median

median(weather_data$mean_temp)
## [1] 10.64
# Variance

var(weather_data$mean_temp)
## [1] 1.61514
# Standard deviation

sd(weather_data$mean_temp)
## [1] 1.270882
# Range

range(weather_data$mean_temp)
## [1] -2.90 13.14
# Inter Quartile Range (IQR)

IQR(weather_data$mean_temp)
## [1] 1.015

Unfortunately, there is no direct function to get the mode. The solutions you will find online are all a bit advanced. So the easiest solution is to look for the mode using a frequency table.

table(weather_data$cold)
## 
##   0   1 
## 440  14

The table() function shows you how often each value is in the vector. You can now identify the most frequent value.

Exercise III: Manipulating data

Now we will work with the weather_data data set. It is already loaded for you and you can use it right away.

  1. Show the variable mean_temp if it is over 10.

  2. Generate a new variable and call it hot that is zero for mean temperature < 10 and 1 for mean temperature > 10 degree Celsius.

  3. Have a look at your data set.

Please solve all three steps in the next code chunk.

Exercise IV: Subsetting

This is a little trickier: Can you find the hottest and coldest city in Germany 2021?

Hint: The functions min() and max() help you to find the minimum and maximum values of a vector or variable. Combine that with your newly learned subsetting skills and you’ll find the answer.

Exercise V: Measures of central tendency

We will continue working with the weather data set

  1. Calculate the mean value of latitude and save the result as mean_latitude.

  2. Calculate the variance of latitude and save the result as var_latitude.

  3. Calculate the standard deviation of latitude and save the result as sd_latitude.

Plotting data

Let’s have a short look at our data again. Remember: head() shows you the first six entries of your data. It is very useful to get a look at the data structure when you have a lot of rows in your dataset.

head(weather_data)
##                         city longitude latitude mean_temp cold
## 1 Sigmarszell-Zeisertsweiler  9.740446 47.57760     11.14    0
## 2         Obersulm-Willsbach  9.352493 49.12801     12.28    0
## 3                   Röllbach  9.253038 49.76440     11.37    0
## 4     Padenstedt (Pony-Park)  9.925507 54.01884     10.22    0
## 5            Elzach-Fisnacht  8.108840 48.20121     11.32    0
## 6           Lippspringe, Bad  8.838795 51.78542     11.12    0

Plots for bivariate distributions

Scatterplots

Now we can create a simple scatterplot:

plot(
  x = weather_data$longitude,
  y = weather_data$mean_temp
)

To get a nicer plot, we can adjust many things. We suggest to always explicitly make those adjustments in the same order.

plot(
  x = weather_data$longitude,
  y = weather_data$mean_temp,
  type = "p", # This explicitly says that we want points. You could also try "l".
  main = "Mean temperatures of German cities", # This adds a title to the plot
  xlab = "Longitude (West - East)", # This labels the x-axis.
  ylab = "Mean Temperature in 2021", # What does this do then?
  las = 1, # This affects the tick labels of the y-axis.
  pch = 19, # Here we choose what symbols we want to plot.
  col = "black", # What color should the symbols have?
  frame = F # No box around the plot.
)

Adding Color to Plots with Viridis

We can also adjust the colors. Let’s highlight Mannheim!

Pro Tip: To color up your data visualizations, use the viridis-package.

Viridis colors make it easier to read by those with colorblindness and print well in greyscale. You probably don’t want to have plots like this:

We first need a vector that gives us the right colors with respect to the city variable.

library(viridis)
## Loading required package: viridisLite
# we need two colors, this is how we get them:
two_colors <- viridis(2)

two_colors # these are so-called HEX color codes
## [1] "#440154FF" "#FDE725FF"
# we use the first color for males and the second for females
mannheim_color <- ifelse(weather_data$city == "Mannheim", two_colors[1], two_colors[2])

# let's have a look:
table(mannheim_color) 
## mannheim_color
## #440154FF #FDE725FF 
##         1       453

Now we can use this vector to specify the color respective to Mannheim:

plot(
  x = weather_data$longitude,
  y = weather_data$mean_temp,
  type = "p", # This explicitly says that we want points. You could also try "l".
  main = "Mean temperatures of German cities", # This adds a title to the plot
  xlab = "Longitude (West - East)", # This labels the x-axis.
  ylab = "Mean Temperature in 2021", # What does this do then?
  las = 1, # This affects the tick labels of the y-axis.
  pch = 19, # Here we choose what symbols we want to plot.
  col = mannheim_color, # Instead of just black we now use the color vector.
  frame = F # No frame around the plot.
)

Now that we use different colors, we also need a legend to know which color is which.

plot(
  x = weather_data$longitude,
  y = weather_data$mean_temp,
  type = "p", # This explicitly says that we want points. You could also try "l".
  main = "Mean temperatures of German cities", # This adds a title to the plot
  xlab = "Longitude (West - East)", # This labels the x-axis.
  ylab = "Mean Temperature in 2021", # What does this do then?
  las = 1, # This affects the tick labels of the y-axis.
  pch = 19, # Here we choose what symbols we want to plot.
  col = mannheim_color, # Instead of just black we now use the color vector.
  frame = F # No frame around the plot.
)
legend(
  "bottomleft", # Locate the legend in the topleft corner.
  legend = c("Mannheim", "other"), # Give it labels.
  pch = 19, # Specify symbols as in the scatterplot.
  col = two_colors, # Specify colors.
  bty = "n" # No box around the legend.
)

plot(
  x = weather_data$longitude,
  y = weather_data$mean_temp,
  type = "p", # This explicitly says that we want points. You could also try "l".
  main = "Mean temperatures of German cities", # This adds a title to the plot
  xlab = "Longitude (West - East)", # This labels the x-axis.
  ylab = "Mean Temperature in 2021", # What does this do then?
  las = 1, # This affects the tick labels of the y-axis.
  pch = 19, # Here we choose what symbols we want to plot.
  col = mannheim_color, # Instead of just black we now use the color vector.
  frame = F # No frame around the plot.
)
# we want to label the point that refers to Mannheim
# We can do that with the text() function,
# But we need to subset the data, so that only Mannheim gets labelled,
# and no other city
text(
  x = weather_data$longitude[weather_data$city == "Mannheim"], # subset Mannheim
  y = weather_data$mean_temp[weather_data$city == "Mannheim"], # subset Mannheim
  labels = "Mannheim", # label Mannheim as "Mannheim"
  pos = 4 # position the label right to the point
)

Plots for univariate distributions

Histograms

Now we want to visualize mean temperature with a histogram. This is how you get a standard histogram:

hist(x = weather_data$mean_temp) # That's intuitive, but does not look too great

Again, we can adjust many things to make it nicer.

hist(
  x = weather_data$mean_temp, # For a histogram we only specify x.
  breaks = 50, # specify the number of bins
  main = "A Histogram",
  xlab = "Mean temperature in degree Celsius",
  ylab = "Number of observations",
  las = 1, # shift the y-axis labels 
  col = viridis(1), # One color only (first color from viridis)
  border = "white" # That's the color of the bin borders.
)

Density Plots

We can also create density plots.

plot(
  density(weather_data$mean_temp), # density() takes care of x, y and type.
  main = "A Simple Density Plot",
  xlab = "Mean temperature in degree Celsius",
  ylab = "", # The y-axis is not really meaningful here.
  col = viridis(1),
  lwd = 2, # Control the width of the line
  frame = F,
  yaxt = "n" # Remove the y-axis.
)

And we can also fill the are underneath the curve:

plot(
  density(weather_data$mean_temp), # density() takes care of x, y and type.
  main = "A Simple Density Plot",
  xlab = "Mean temperature in degree Celsius",
  ylab = "", # The y-axis is not really meaningful here.
  col = viridis(1),
  lwd = 2, # Control the width of the line
  frame = F,
  yaxt = "n" # Remove the y-axis.
)

polygon(density(weather_data$mean_temp), 
        col = viridis(1, alpha = 0.5) # same color but 50% transparent
        )

…and Boxplots

boxplot(
  x = weather_data$mean_temp, # As for histograms we only specify x.
  main = "Boxplot of Mean temperature in degree Celsius",
  ylab = "Mean temperature in degree Celsius",
  las = 1,
  col = plasma(1),
  frame = F
)

Or a horizontal boxplot.

boxplot(
  x = weather_data$mean_temp,
  horizontal = T, # With horizontal = T we rotate the boxplot.
  main = "Horizontal Boxplot of Mean temperature in degree Celsius",
  xlab = "Mean temperature in degree Celsius",
  las = 1,
  frame = F
)

You learned in the lecture that boxplots have some disadvantages.

Violin plots are a very nice alternative!

This is how you get them:

library(vioplot)
## Loading required package: sm
## Package 'sm', version 2.2-6.0: type help(sm) for summary information
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
vioplot(
  x = weather_data$mean_temp,
  horizontal = T, # With horizontal = T we rotate the boxplot.
  main = "Horizontal Violinplot of Mean temperature in degree Celsius",
  xaxt = "n",
  xlab = "Mean temperature in degree Celsius",
  bty = "n",
  axes = FALSE,
  names = "",
  border = NA
)

Exercise VI: Plotting

Okay, last round of exercises for today:

  1. Make a histogram of the latitude variable.

  2. Make the plot nice looking (Name the axes, main title, colors…)

Recap

What we learned in this session:

  1. How to work with R and GitHub.
  2. Assigning objects in R.
  3. Different data structures in R.
  4. How to get to single elements within data structures.
  5. Working with data frames.
  6. How to load a data set into R.
  7. How to make nice looking plots in R.

What you will do in your homework.

The first lab session and this script should equip you with all the tools (and lines of code) to tackle the first homework assignment.

Copy the lines of code that worked for something similar. Then, adjust the code according to your problem.

Substantially, in your homework you will inspect a data set on US presidential elections. You will calculate some measures of central tendency and variability. Finally, you will produce some nice plots.

It is best to get started with your homework as soon as possible (after it was handed out on Friday).

Try to write the R Code first. We will provide you a .Rmd template to hand in your results.

In order to pass the homework assignment you need to tackle ALL problems of a problem set. For a pass you also need to get most of the problems right (or at least show us that you tried everything to get it right.)

Closing remarks.

If you have any questions concerning the lecture or the tutorial please post them on Slack. We will answer them on a regular basis.

Do not hesitate to come to the office hours!

And always remember if you have a question, it is never a stupid question. In fact most of your fellow students probably have the same or a similar question. By asking it, everyone in this class will profit.